The Proof of McAllester’s PAC-Bayesian Theorem
نویسنده
چکیده
Extends McAllester’s PAC-Bayesian theorem [4], allowing for other tail bounds than Hoeffding’s, and simplifies the proof somewhat. 1 McAllester’s PAC-Bayesian theorem McAllester’s PAC-Bayesian theorem is an astonishing result. Although its proof requires only elementary techniques, it can be used to obtain very tight generalization error bounds for the Gibbs versions of (approximate) Bayesian classification models. In special cases, which are often practically relevant, we can obtain bounds for the corresponding Bayes versions as well. Suppose we are given a concept space C and a space of labeled examples X . We denote concepts by c ∈ C, examples by x ∈ X . Suppose also we are given a loss function l(c, x), mapping pairs (c, x) to the interval [a, b] or [a, b) (here, 0 ≤ a < b). There is an unknown data distribution D from which examples x can be drawn, and we do not make any assumptions on D. For each concept c, we can define the expected loss l(c) = E[l(c, x)], where the expectation is taken over x ∼ D. Given a sample S = {x1, . . . , xm} of size m, drawn independently and identically distributed (i.i.d.) from D, we can also define the empirical loss l̂(c) = m ∑ i l(c, xi). Note that ES[l̂(c)] = l(c). Depending on the loss function, it is possible to bound the probability (over S) for large deviations between l̂(c) and its expected value l(c). McAllester’s technique takes such a large deviation bound for single concepts c and transforms it into a bound for Gibbs concepts, which in turn depend on a distribution over all concepts in C (this is introduced in detail below). Therefore, we need to assume the availability of a large deviation bound for single l̂ = l̂(c), l = l(c). Namely, there have to exist constants C > 0, β > 0 and a function φ on (a, b)× (a, b) s.t. Pr {
منابع مشابه
Gaussian Processes Classification and its PAC-Bayes Generalization Error Bounds – CSE 291 Project Report
McAllester’s PAC-Bayes theorem (strengthened by [4]) characterizes the convergence of a stochastic classifier’s empirical error to its generalization error. Fixed one ”prior” distribution P (h) over hypothesis space H, the theorem can hold for all ”posterior” distribution Q(h) over H simultaneously, so in practice we can find a data-dependent posterior distribution overH as the distribution of ...
متن کاملSupplementary Material to A PAC-Bayesian Approach forDomain Adaptation with Specialization to Linear Classifiers
In this document, Section 1 contains some lemmas used in subsequent proofs, Section 2 presents an extended proof of the bound on the domain disagreement disρ(DS , DT ) (Theorem 3 of the main paper), Section 3 introduces other PAC-Bayesian bounds for disρ(DS , DT ) and RPT (Gρ), Section 4 shows equations and implementation details about PBDA (our proposed learning algorithm for PAC-Bayesian DA t...
متن کاملA Note on the PAC Bayesian Theorem
We prove general exponential moment inequalities for averages of [0,1]valued iid random variables and use them to tighten the PAC Bayesian Theorem. The logarithmic dependence on the sample count in the enumerator of the PAC Bayesian bound is halved.
متن کاملPAC-Bayesian Bounds based on the Rényi Divergence
We propose a simplified proof process for PAC-Bayesian generalization bounds, that allows to divide the proof in four successive inequalities, easing the “customization” of PAC-Bayesian theorems. We also propose a family of PAC-Bayesian bounds based on the Rényi divergence between the prior and posterior distributions, whereas most PACBayesian bounds are based on the KullbackLeibler divergence....
متن کاملA new proof for the Banach-Zarecki theorem: A light on integrability and continuity
To demonstrate more visibly the close relation between thecontinuity and integrability, a new proof for the Banach-Zareckitheorem is presented on the basis of the Radon-Nikodym theoremwhich emphasizes on measure-type properties of the Lebesgueintegral. The Banach-Zarecki theorem says that a real-valuedfunction $F$ is absolutely continuous on a finite closed intervalif and only if it is continuo...
متن کامل